In this notebook I will test my setup for generating adversarial noise on images using a more complex dataset than MNIST: caltech-101. I had originally wanted to use ImageNet for this since it is really the standard for most of these models, but it looks like my harddrive might not be able to hold a 150GB dataset (it would fill up all of the remaining space on my harddrive, and I've heard that this can present problems when running programs). Since this seems out of my reach without some upgrades to my computer, I will opt to look at another dataset with many classes: caltech-101 (link to the dataset: http://www.vision.caltech.edu/Image_Datasets/Caltech101/). I will start by copying EDA on this data which I did in my sixth biweekly report:
import torch
import torch.nn as nn
import torch.nn.functional as F
import pandas as pd
import numpy as np
import time
from sklearn.metrics import accuracy_score
import matplotlib.pyplot as plt
from PIL import Image
import torchvision.transforms.functional as TF
import torchvision
import os
import matplotlib.pyplot as plt
import torchvision.transforms as transforms
import progressbar #If you're missing this library it is found here: https://pypi.org/project/progressbar2/
print("Is a GPU available? ")
torch.cuda.is_available()
Is a GPU available?
True
def loadData(path):
#Used this for help https://stackoverflow.com/questions/973473/getting-a-list-of-all-subdirectories-in-the-current-directory
paths = sorted([x[0] for x in os.walk(path)][1:])
classes = []
imSize = 128
for i in range(len(paths)):
classes.append(paths[i][28:])
nameMap = {}
outList = []
targetList = []
#read in data
with progressbar.ProgressBar(max_value=len(paths)) as bar:
for i in range(len(paths)):
folderName = paths[i]
numImages = len([name for name in os.listdir(folderName)])
k = 0
for file in sorted(os.listdir(folderName)):
if file.endswith(".jpg"):
#read an image into pytorch: https://discuss.pytorch.org/t/how-to-read-just-one-pic/17434/2
image = Image.open(os.path.join(folderName, file))
transformer = transforms.Compose([transforms.Resize((imSize, imSize)), transforms.ToTensor()])
x = transformer(image)
x.unsqueeze_(0)
if x.shape[1] == 1:
x = x.repeat(1,3,1,1)
if k == 0:
out = torch.zeros(numImages, x.shape[1], imSize, imSize)
out[k,:,:,:] = x
else:
out[k,:,:,:] = x
k+=1
outList.append(out)
targetList.append(i)
nameMap[i] = classes[i]
bar.update(i)
#create target vector, and turn outList into a tensor
with progressbar.ProgressBar(max_value=len(outList)) as bar:
for i in range(len(outList)):
if i == 0:
out = outList[i]
target = torch.tensor(targetList[i]).repeat(outList[i].shape[0], 1)
else:
out = torch.cat((out, outList[i]), dim = 0)
target = torch.cat((target, torch.tensor(targetList[i]).repeat(outList[i].shape[0], 1)), dim = 0)
bar.update(i)
return out, target, nameMap
data, target, nameMap = loadData("./Data/101_ObjectCategories")
print("The shape of the training data is:")
print(data.shape)
print("The shape of the target is:")
print(target.shape)
100% (102 of 102) |######################| Elapsed Time: 0:00:21 Time: 0:00:21 100% (102 of 102) |######################| Elapsed Time: 0:00:24 Time: 0:00:24
The shape of the training data is: torch.Size([9145, 3, 128, 128]) The shape of the target is: torch.Size([9145, 1])
#Made this the help of this post: https://stackoverflow.com/questions/22127769/python-frequency-of-occurrences
plt.hist(target.numpy(), bins=np.arange(target.numpy().min(), target.numpy().max()+1))
plt.show()
#Get the count for each target
counts = {}
for i in torch.unique(target):
counts[i] = 0
for j in range(target.shape[0]):
if target[j] == i:
counts[i]+=1
#How many values do we want to print out
top = 5
listOfTopScores = []
#Keep track of the lowest count in our list
minCountInList = 0
for i in counts.keys():
#If the current count can be put in the list
if counts[i] > minCountInList:
#If our list is not full yet
if len(listOfTopScores) < top:
listOfTopScores.append((counts[i], nameMap[i.item()]))
else:
listOfTopScores.append((counts[i], nameMap[i.item()]))
for j in range(len(listOfTopScores)):
if listOfTopScores[j][0] == min(listOfTopScores)[0]:
del listOfTopScores[j]
break
minCountInList = min(listOfTopScores)[0]
print("The top " + str(top) + " class counts in the data are:")
print(sorted(listOfTopScores))
The top 5 class counts in the data are: [(435, 'Faces'), (435, 'Faces_easy'), (468, 'BACKGROUND_Google'), (798, 'Motorbikes'), (800, 'airplanes')]
Some classes are overrepresented in this data. This may be a problem, but I don't believe that over or undersampling will provide much of a benefit, since that may introduce new bias into the data (over sampling may make the neural net more sensitive to repeated examples, and undersampling will lose some potentially important data for making generalized algorithms). Now I will print some grey scale images to get an idea of the content of the data is:
#copy some code I wrote in week 3 for my CIFAR-10 analysis:
gray_scale_imgs = torch.zeros(data.shape[0], data.shape[2], data.shape[3])
for i in range(data.shape[0]):
img = data[i,:,:,:]
img = transforms.ToPILImage(mode='RGB')(img)
img = transforms.functional.to_grayscale(img)
gray_scale_imgs[i,:,:] = transforms.ToTensor()(img)
#Plot the gray scaled images
fig, axs = plt.subplots(17, 6, figsize=(20, 60), sharey=True)
fig.tight_layout()
k = 0
l = 0
for i in torch.unique(target):
for j in range(data.shape[0]):
if target[j] == i:
img = gray_scale_imgs[j,:,:].numpy()
axs[k,l].imshow(img)
axs[k,l].set_title(nameMap[i.item()])
l+=1
if l == 6:
l = 0
k += 1
break
del(gray_scale_imgs)
There are many classes in this data, but all of them do seem to be the focus of the picture. This is a bias that may actually help our classification neural network, and since the goal is to make it make mistakes I think this bias is okay to leave in (since anything that may help our classification network will actually make the job of the generator harder). Now I'll look at a histogram of average image values:
avgs = torch.zeros(data.shape[0])
for i in range(data.shape[0]):
avgs[i] = torch.mean(data[i,:,:,:])
plt.hist(avgs.numpy(), bins = 10)
plt.title("Histogram of Average Image Values")
plt.show()
This looks slightly skewed so we can try normalizing:
#copied from my week 5 biweekly report:
with progressbar.ProgressBar(max_value=data.shape[2]) as bar:
for i in range(data.shape[2]):
for j in range(data.shape[3]):
data[:,:,i,j] = (data[:,:,i,j] - torch.mean(data[:,:,i,j], dim = 0))/(torch.std(data[:,:,i,j], dim = 0))
#min max scaling to rescale it to [0,1]:
minima = torch.min(data[:,:,i,j])
maxima = torch.max(data[:,:,i,j])
data[:,:,i,j] = (data[:,:,i,j]-minima)/(maxima-minima)*(1-0)+0
bar.update(i)
avgs = torch.zeros(data.shape[0])
for i in range(data.shape[0]):
avgs[i] = torch.mean(data[i,:,:,:])
plt.hist(avgs.numpy(), bins = 10)
plt.title("Histogram of Average Image Values")
plt.show()
100% (128 of 128) |######################| Elapsed Time: 0:01:33 Time: 0:01:33
Normalization did not remove this skew, so I believe that this step is unnecessary since it would appear that the images are already approximately normal. We can now move on to the application of our neural network. Note: I will start from scratch so that I can rerun just this section of code for the purposes of testing (that way I won't need to rerun the EDA every time):
import torch
import torch.nn as nn
import torch.nn.functional as F
import pandas as pd
import numpy as np
import time
import torchvision.models as models
from sklearn.metrics import accuracy_score
import matplotlib.pyplot as plt
from PIL import Image
import torchvision.transforms.functional as TF
import torchvision.transforms as transforms
import os
import matplotlib.pyplot as plt
import progressbar #If you're missing this library it is found here: https://pypi.org/project/progressbar2/
print("Is a GPU available? ")
#For reproducability
torch.backends.cudnn.deterministic = True
torch.manual_seed(0)
torch.cuda.manual_seed_all(0)
np.random.seed(0)
torch.cuda.is_available()
Is a GPU available?
True
def loadData(path):
#Used this for help https://stackoverflow.com/questions/973473/getting-a-list-of-all-subdirectories-in-the-current-directory
paths = sorted([x[0] for x in os.walk(path)][1:])
classes = []
imSize = 28
for i in range(len(paths)):
classes.append(paths[i][28:])
nameMap = {}
outList = []
targetList = []
#read in data
with progressbar.ProgressBar(max_value=len(paths)) as bar:
for i in range(len(paths)):
folderName = paths[i]
numImages = len([name for name in os.listdir(folderName)])
k = 0
for file in sorted(os.listdir(folderName)):
if file.endswith(".jpg"):
#read an image into pytorch: https://discuss.pytorch.org/t/how-to-read-just-one-pic/17434/2
image = Image.open(os.path.join(folderName, file))
transformer = transforms.Compose([transforms.Resize((imSize, imSize)), transforms.ToTensor()])
x = transformer(image)
x.unsqueeze_(0)
if x.shape[1] == 1:
x = x.repeat(1,3,1,1)
if k == 0:
out = torch.zeros(numImages, x.shape[1], imSize, imSize)
out[k,:,:,:] = x
else:
out[k,:,:,:] = x
k+=1
outList.append(out)
targetList.append(i)
nameMap[i] = classes[i]
bar.update(i)
#create target vector, and turn outList into a tensor
with progressbar.ProgressBar(max_value=len(outList)) as bar:
for i in range(len(outList)):
if i == 0:
out = outList[i]
target = torch.tensor(targetList[i]).repeat(outList[i].shape[0], 1)
else:
out = torch.cat((out, outList[i]), dim = 0)
target = torch.cat((target, torch.tensor(targetList[i]).repeat(outList[i].shape[0], 1)), dim = 0)
bar.update(i)
return out, target, nameMap
data, target, nameMap = loadData("./Data/101_ObjectCategories")
target = target.view(target.shape[0])
print("The shape of the training data is:")
print(data.shape)
print("The shape of the target is:")
print(target.shape)
100% (102 of 102) |######################| Elapsed Time: 0:00:12 Time: 0:00:12 100% (102 of 102) |######################| Elapsed Time: 0:00:01 Time: 0:00:01
The shape of the training data is: torch.Size([9145, 3, 28, 28]) The shape of the target is: torch.Size([9145])
from sklearn.model_selection import train_test_split
train_x, test_x, train_y, test_y = train_test_split(data, target, test_size=0.1, random_state=4)
#Now we'll look an image and write a function that will take a tensor and show the plot of the image
import matplotlib.pyplot as plt
def showImage(x):
img = x.detach().cpu().numpy()
plt.imshow(np.transpose(img, (1,2,0)), interpolation='nearest')
showImage(train_x[1,:,:,:])
You'll notice that I opted to do some very substantial down sizing of the images this time. This is because it already took my computer a substantial amount of time to work with MNIST, so I wanted to make sure that the images were low enough resolution so that I could experiment with the model in a reasonable amount of time. This also allows us to use a much larger batch size, which is advantageous for the purpose of my proposed model's speed. We will now train the $\Phi$ neural network:
"""
Net -- defines a neural network that uses resnet18 as a pretrained backbone
"""
class Net(nn.Module):
def __init__(self):
super(Net, self).__init__()
self.conv1 = nn.Conv2d(3,3,1)
self.relu = nn.ReLU()
self.convLayers = torch.nn.Sequential(*(list(models.resnet18(pretrained=True).children())[:-3]))
self.global_avg = nn.AvgPool2d(2)
self.fc1 = nn.Linear(256,50)
self.fc2 = nn.Linear(50,50)
self.fc3 = nn.Linear(50,102)
def forward(self,x):
x = self.relu(self.conv1(x))
x = self.convLayers(x)
x = self.global_avg(x)
x = x.view(-1, 256)
x = self.relu(self.fc1(x))
x = self.relu(self.fc2(x))
x = torch.softmax(self.fc3(x), dim = 1)
return x
#train the net:
model = Net().cuda()
criterion = nn.CrossEntropyLoss()
learning_rate = 1e-4
optimizer = torch.optim.Adam(model.parameters(), lr=learning_rate)
batch_size = 64
steps = int(np.floor(train_x.shape[0]/batch_size))
epochs = 100
for epoch in range(epochs):
running_loss = 0.0
#shuffle the training data every epoch
temp = torch.randperm(train_x.shape[0])
train_x = train_x[temp, :, :, :]
train_y = train_y[temp]
for i in range(steps):
optimizer.zero_grad()
batch_x = train_x[batch_size*i:batch_size*(i+1),:,:,:].float().cuda()
batch_y = train_y[batch_size*i:batch_size*(i+1)].cuda()
out = model(batch_x)
loss = criterion(out, batch_y.long())
loss.backward()
optimizer.step()
running_loss+=loss.item()
if i == steps-1 and epoch%10 == 9:
print("The running loss for epoch " + str(epoch+1) + " was " + str(running_loss/steps))
The running loss for epoch 10 was 4.314578659832478 The running loss for epoch 20 was 4.226814195513725 The running loss for epoch 30 was 4.139003651216626 The running loss for epoch 40 was 4.056360835209489 The running loss for epoch 50 was 4.021001955494285 The running loss for epoch 60 was 4.0098840072751045 The running loss for epoch 70 was 4.002568053081632 The running loss for epoch 80 was 3.998543856665492 The running loss for epoch 90 was 3.9907484613358974 The running loss for epoch 100 was 3.9795080088078976
model.eval()
acc = 0
running_loss = 0
for i in range(test_x.shape[0]):
out = model(test_x[i,:,:,:].view(1,3,28,28).float().cuda())
running_loss += criterion(out, test_y[i].view(1).cuda().long()).item()
out = torch.argmax(out, dim = 1)
if out == test_y[i]:
acc+=1
print("The test accuracy was: " + str(acc/test_y.shape[0]))
print("The test loss was: " + str(running_loss/test_y.shape[0]))
The test accuracy was: 0.5049180327868853 The test loss was: 4.144306813693437
This accuracy is not great, but it is also not terrible (especially since there are so many classes). I don't think I need to really optimize the accuracy of $\Phi$ since the main point of this is to trick it, so I'll leave it as is for now. Implementing my GAN setup:
"""
AttentiveFuse -- implements an attentive fuse operation as detailed by https://arxiv.org/pdf/1901.06322.pdf
@params - c_in: the number of input channels (it will output 1/2 this number, since that will keep the number of filters constant
in the large network)
c_out: The number of output channels in the convolutional layer
"""
class AttentiveFuse(nn.Module):
def __init__(self, c_in, c_out):
super(AttentiveFuse, self).__init__()
self.c_in = c_in
self.c_out = c_out
#----------
#The first convolutional layers:
#----------
self.convLayers1 = nn.Sequential(nn.Conv2d(self.c_in, self.c_out, 3, padding = 1),
nn.ReLU(),
nn.BatchNorm2d(self.c_out),
nn.Conv2d(self.c_out, 2*self.c_out, 3, padding = 1),
nn.ReLU(),
nn.BatchNorm2d(2*self.c_out),
nn.Conv2d(2*self.c_out, self.c_in, 3, padding = 1),
nn.Sigmoid(),
nn.BatchNorm2d(self.c_in))
#----------
#The second convolutional layers:
#----------
self.convLayers2 = nn.Sequential(nn.Conv2d(int(self.c_in/2), self.c_out, 3, padding = 1),
nn.ReLU(),
nn.BatchNorm2d(self.c_out),
nn.Conv2d(self.c_out, int(self.c_in/2), 3, padding = 1),
nn.ReLU(),
nn.BatchNorm2d(int(self.c_in/2)))
def forward(self, x, y):
#----------
#concatinate the two inputs
#----------
cat = torch.cat((x,y), dim = 1)
#----------
#apply the first convolutional layers
#----------
cat = self.convLayers1(cat)
#----------
#separate the attentive filters and multiply pointwise
#----------
x_att = cat[:,0:int(self.c_in/2),:,:]
y_att = cat[:,int(self.c_in/2):cat.shape[1],:,:]
x = x*x_att
y = y*y_att
#----------
#add both of the inputs (scaled by attention) together
#----------
out = x+y
#----------
#Run out through the final convolutional layers
#----------
out = self.convLayers2(out)
return out
"""
SpatialPooling -- Implements attentive spatial pooling as defined by https://arxiv.org/pdf/1901.06322.pdf
@params - c_in - input channels
c_out - number of channels in the inbetween networks
dilationRates - what dilation rates should we use in our convolutions for ASPP
"""
from torch.nn.parameter import Parameter
class SpatialPooling(nn.Module):
def __init__(self, c_in, c_out = 32, dilationRates = [3,6,12,18]):
super(SpatialPooling, self).__init__()
#----------
#Initialize parameters
#----------
self.dilationRates = dilationRates
self.c_in = c_in
self.c_out = c_out
self.atrousConvs = nn.ModuleList([]) #Keeps track of the set of atrous convolutions
self.fuses = nn.ModuleList([]) #Keeps track of the fusing modules
#----------
#Populate atrous Convs and fuses modules lists
#We need 1 less fusion module than we have atrous convolution representation (so we have an if statement to take care of this)
#Each atrous conv in this module will follow the form atrous conv -> relu -> batch normalization
#----------
for i in self.dilationRates:
if i != self.dilationRates[0]:
self.fuses.append(AttentiveFuse(self.c_out*2, 32))
self.atrousConvs.append(nn.Sequential(nn.Conv2d(self.c_in, self.c_out, 3, padding = i, dilation = i),
nn.ReLU(),
nn.BatchNorm2d(self.c_out)))
#----------
#In order to apply the residual connection we will need to reduce the dimension of the output to have the same number of filters as
#the input
#----------
self.convReduce = nn.Sequential(nn.Conv2d(self.c_out, self.c_in, 1),
nn.ReLU(),
nn.BatchNorm2d(self.c_in))
#----------
#Initialize the gamma in the residual scaling
#----------
self.gamma = Parameter(torch.zeros(1,1))
self.gamma = nn.init.xavier_uniform_(self.gamma)
def forward(self, x):
#----------
#Save the original x for the skip connection
#----------
x_mem = x.clone()
#----------
#Do the atrous convolutions, keeping track of each one in a list
#----------
savedConvs = []
for i in range(len(self.atrousConvs)):
savedConvs.append(self.atrousConvs[i](x))
#----------
#Fuse together all of the atrous convolutions
#----------
for i in range(len(self.fuses)):
if i == 0:
out = self.fuses[i](savedConvs[i], savedConvs[i+1])
else:
out = self.fuses[i](out, savedConvs[i+1])
#----------
#Reduce the number of filters in out so that it is the same as that of the input
#----------
out = self.convReduce(out)
#----------
#Create a residual connection with a learned scaling
#----------
out = out*self.gamma + (1-self.gamma)*x_mem
return out
"""
encoderBlock - defines a single encoder block for the generator
@params: Cin - int, the number of input channels
Cout - int, the number of output channels
batchNorm - bool, tells us if we should use batch normalization for this layer or not
@methods: forward: defines the forward pass of the model
"""
class encoderBlock(nn.Module):
def __init__(self, Cin, Cout, batchNorm = True):
super(encoderBlock, self).__init__()
self.conv = nn.Conv2d(Cin, Cout, 4, stride = 2, padding = 1) #reduces dim by 1/2
self.BN = nn.BatchNorm2d(Cout)
self.leReLU = nn.LeakyReLU(.2)
self.batchNorm = batchNorm
def forward(self, x):
if self.batchNorm:
x = self.leReLU(self.BN(self.conv(x)))
else:
x = self.leReLU(self.conv(x))
return x
"""
decoderBlock - defines a decoder block for the generator
@params: Cin - int, the number of input channels
Cout - int, the number of output channels
useDrop - bool, whether or not to use a dropout layer
batchNorm - bool, whether or not to use batch normalization
@methods: forward(self, x): defines the forward pass of the model
"""
class decoderBlock(nn.Module):
def __init__(self, Cin, Cout, useDrop = True, batchNorm = True):
super(decoderBlock, self).__init__()
self.tConv = nn.ConvTranspose2d(Cin, Cout, 4, stride = 2, padding = 1) #upsamples by 2
self.BN = nn.BatchNorm2d(Cout)
self.drop = nn.Dropout(.5)
self.useDrop = useDrop
self.batchNorm = batchNorm
def forward(self, x):
if self.useDrop and self.batchNorm:
x = self.drop(self.BN(self.tConv(x)))
elif self.batchNorm:
x = self.BN(self.tConv(x))
else:
x = self.tConv(x)
return x
"""
Generator - defines a generator for our image translations task, expects images to be 512x512
@params - None
@methods: forward(self, x): Defines the forward pass of the model
"""
class Generator(nn.Module):
def __init__(self):
super(Generator, self).__init__()
#-----------
#U Net part
#-----------
#define our encoder blocks:
#28x28
self.enc1 = encoderBlock(3, 128, False)
#14x14
self.enc2 = encoderBlock(128, 128)
#7x7
#this is probably small enough to start adding decoder blocks (I was not able to make the full model due to RAM issues)
self.dec1 = decoderBlock(128,128, False)
#14x14
self.dec2 = decoderBlock(128*2, 3, False, False)
#28x28
#-----------
#ASPP Part
#-----------
self.ASPP = SpatialPooling(3)
self.finalConv = nn.Sequential(nn.Conv2d(6,3,1),
nn.BatchNorm2d(3))
self.relu = nn.ReLU()
self.tanh = nn.Tanh()
def forward(self, x):
x_mem = x.clone()
#----------
#ASPP on the input
#----------
xASPP = x.clone()
xASPP = self.ASPP(xASPP)
#list of the inputs to use in U-net
temp = []
#----------
#contracting path
#----------
x = self.enc1(x)
#append for use in U-net
temp.append(x)
x = self.enc2(x)
#----------
#dilating path
#----------
x = self.dec1(x)
x = F.relu(torch.cat((x, temp[0]), dim = 1))
#output
x = self.relu(self.dec2(x))
x = self.tanh(self.finalConv(torch.cat((x,xASPP), dim = 1)))
x_mem = x+x_mem
return x_mem, x
"""
Discriminator - implement a discriminator. This uses the skeleton from my thrid biweekly report
"""
class Discriminator(nn.Module):
#Note that exImg will contain and example image which we can get the dimension of NOTE: this should have dimension (1, 1, height, width)
#batchSize will be the size of the batches
def __init__(self):
super(Discriminator, self).__init__()
self.C64 = encoderBlock(3, 64, False)
self.C128 = encoderBlock(64, 64)
self.fc1 = nn.Linear(64*7*7, 1)
def forward(self, x):
x = self.C64(x)
x = self.C128(x)
x = x.view(-1,64*7*7)
x = torch.sigmoid(self.fc1(x))
return x
"""
NegativeSLoss -- Calculates the sparsity loss for a tensor of negative values
@params: - x, a tensor of only negative values
"""
def NegativeSLoss(x):
eps = 10**-6
loss = x*torch.log(torch.abs(x) + eps)
return loss
"""
PostiveSLoss -- Calculates the sparsity loss for a tensor of positive values
@params: - x, a tensor of only positive values
"""
def PostiveSLoss(x):
eps = 10**-6
loss = -x*torch.log(x + eps)
return loss
"""
SparseLoss -- Calculates the sparse loss for a tensor, returns a single value
@params: - x, a tensor of values that we would like the sparse loss of
"""
def SparseLoss(x):
nLoss = NegativeSLoss(x*(x<0))
pLoss = PostiveSLoss(x*(x>0))
loss = torch.mean(nLoss+pLoss)
return loss
def getRealSamples(data, y, batch_size):
temp = torch.randperm(batch_size)
result = data[temp, :, :, :]
result = result.float()
y = y[temp].float()
return result, y
criterion = nn.CrossEntropyLoss()
gen = Generator().cuda()
dis = Discriminator().cuda()
batch_size = 64
#define our two optimizers
learning_rate = 5e-5
#Use two separate RMSprop optimizers, one for the generator and one for the discriminator
gOptimizer = torch.optim.RMSprop(gen.parameters(), lr=learning_rate)
dOptimizer = torch.optim.RMSprop(dis.parameters(), lr=learning_rate)
#Now we will define the training loop:
epochs = 10
batchesPerEpoch = int(np.floor(train_x.shape[0]/batch_size))
with progressbar.ProgressBar(max_value=epochs*batchesPerEpoch) as bar:
for epoch in range(epochs):
running_gen_loss = 0
for i in range(batchesPerEpoch):
epsilon = 10**-6
#get a minibatch of the real data
batch_x, batch_y = getRealSamples(train_x, train_y, batch_size)
batch_x = batch_x.cuda()
batch_y = batch_y.cuda()
#----------
#Train discriminator on real data
#----------
dOptimizer.zero_grad()
Dx = dis(batch_x)
dxLoss = -1*torch.mean(torch.log(Dx+epsilon))
#find the gradients for this step
dxLoss.backward()
#----------
#Train discriminator on generated data
#----------
Gz, _ = gen(batch_x.float())
DGz = dis(Gz)
dgzLoss = 1*torch.mean(torch.log(1-DGz+epsilon))
dgzLoss.backward()
dOptimizer.step()
#----------
#Get the losses for the generator
#----------
gOptimizer.zero_grad()
#----------
#Get the mask, and generated image of the generator for this step
#----------
batch_x, batch_y = getRealSamples(train_x, train_y, batch_size)
batch_x = batch_x.cuda()
batch_y = batch_y.cuda()
xr, r = gen(batch_x.float())
#----------
#Find the loss and grads for sparcity and size
#----------
loss_sparse = 1000*SparseLoss(r)
loss_sparse.backward(retain_graph=True)
#Apply two norms here since we need a matrix norm then a vector norm to get an accurate gauge of the distance of a 3D tensor
loss_size = 10*torch.mean(torch.norm(torch.norm(r, dim = (2,3)), dim = 1))
loss_size.backward(retain_graph=True)
#----------
#Find the loss against the discriminator
#----------
loss_dis = torch.mean(torch.log(dis(xr) + epsilon))
loss_dis.backward(retain_graph=True)
#----------
#Find the classification loss
#----------
loss_adv = -10*criterion(model(xr), batch_y.long())
loss_adv.backward()
gOptimizer.step()
bar.update(i+batchesPerEpoch*epoch)
print("Done traininig")
print("An example of a generated image:")
showImage(xr[0,:,:,:].cpu())
100% (1280 of 1280) |####################| Elapsed Time: 0:34:54 Time: 0:34:54
Done traininig An example of a generated image:
print("The image with no noise:")
showImage(batch_x[0,:,:,:].cpu())
The image with no noise:
acc = 0
running_loss = 0
for i in range(test_x.shape[0]):
with torch.no_grad():
input, _ = gen(test_x[i,:,:,:].view(1,3,28,28).float().cuda())
out = model(input)
running_loss += criterion(out, test_y[i].view(1).cuda().long()).item()
out = torch.argmax(out, dim = 1)
if out == test_y[i]:
acc+=1
print("The test accuracy after using the GAN was: " + str(acc/test_y.shape[0]))
print("The test loss after using the GAN was: " + str(running_loss/test_y.shape[0]))
The test accuracy after using the GAN was: 0.053551912568306013 The test loss after using the GAN was: 4.588036131467976
showImage(torch.mean(r.cpu(), dim = 0))
Clipping input data to the valid range for imshow with RGB data ([0..1] for floats or [0..255] for integers).
These results are actually quite amazing, and are much closer to what we expected, ie imperceptable noise that changes the output of the neural net $\Phi$. However, we can't get too excited about it since, if we look at the loss from the network applied to the normal testing images, and the ones with our generated advisarial noise added to them, we can see that they aren't drastically different. This is most likely due to the fact that we are trying to trick our neural network by minimizing the loss associated with the correct class. It looks like our neural net is never all too sure about it's answer, so our generator's job is much easier for this dataset than MNIST, because it doesn't need to push the probability down nearly as far. Still, these results are quite good especially since the generator never saw the test set of images, and was still able to find this very low magnitude noise that will almost always cause $\Phi$ to misclassify the image (when it would have normally been correct).
It would appear that, for more complex data, it is actually easier for our generator to fool $\Phi$. This is most likely due to the fact that we are minimizing the probability that $\Phi$ is correct, so for data where $\Phi$ is less sure of its classifications, it is much easier to make an unsure neural net less sure than it is for a very sure neural net to become unsure. Given more computational resources I would like to try and create a $\Phi$ that is much better at classifying than ours is, but this seems to be out of my reach for now, but it would be an interesting experiment to see if we could still get these amazing results.